</a>

The Battle of Ice Cream

1.- Introduction & Business Idea :

Idea Background: - US Ice Cream Spotlight -

Market Overview

The Global Ice Cream Market is growing at a CAGR of 4.9% during the forecast period (2020-2025) [1].

  • The rise in income, along with an increase in demand for sweet dishes, drives the growth of this market.
  • Whereas, the new innovative flavors that companies have come up with, act as a major driver for the market. Also, the cheaper price of the private label ice-creams are increasing the demand.
  • The sales of low-quality ice cream have been declining, due to the increased preference for premium ice cream.
  • Largest Market: North America

Three key emerging trends that will affect change across the industry [2]:

  1. Niche companies and products
    • The market continues to fragment. While leading companies own the share of voice, the total number of companies increases making the market more crowded than ever.
    • Currently, the US is leading the way for launch activity in handcrafted ice creams, likely relating to the market’s size and maturity.
    • These products can offer innovation inspiration to other markets, specifically with regard to sourcing ingredients from local farmers and using packaging as a distinguishing selling point that celebrates – and appeals to – a sense of individuality.
  1. Health concerns
    • Claims on US ice creams and frozen novelties focus on “absence of negatives”. The most prevalent percent change in claims from 2013 to 2014 include GMO-free (35% increase), hormone free (nearly 25% increase), no additives/preservatives (20% increase), and low/no/reduced calorie (nearly 20% increase). Organic, seasonal and slimming claims all decreased between 2013 and 2014.
    • Overwhelmingly, consumers claim to be buying less ice cream/treats because they are unhealthy, as opposed to too expensive.
    • While health attributes are important, they are not essential for a third (33%) of US consumers.
    • Still, companies are offering consumers more “better for you” options in the way of Greek yogurt options, smaller portion sizes, and better ingredients including vegetables, oats, soy and dairy free.
  1. Flavors and formats
    • The most common retail purchase driver for ice cream/treats in the US is flavor (nearly 70%).
    • Latest flavor innovations are taking a page out of confectionary’s playbook and combining sweet with salty, including salted caramel and salted vanilla flavor combinations. In an attempt to attract the more sophisticated palates of some adults, brands are incorporating everything from bitter fruits and vegetables to cheese and alcohol, creating signature flavors.

Idea Description:

According to ice cream market insights is still possible to be a new player into a saturated environment, only under specific conditions, so target will be to produce organic handcrafted ice cream with signature flavors - Niche product + Health concern + Favor Trends - in US (largest Market for Ice Cream).

The focus of this analysis will be to obtain the best locations for a production center. It is not the intention to offer specific coordinates, just to point towards an initial area to help decision-making for niche ice cream manofacturers.

How to determine the correct location?

  • The best industrial location is the one that minimizes the overall cost, then each location must be analyzed from the point of view of cost to the customer and never from the pure industrial aspect.
  • The objective of a company is that the sum of the global costs (Supply + Production + Physical Distribution) of each piece at the time of delivery to the supplier is minimal.

Premises to be covered by the production centers

Consequently, the location of the production centers must meet the achievement of the following premises:

  • It must allow to reach the market in the terms of restocking and launching of new products that the market demands.
  • It must optimize overall profitability and not focus on mere cost of production.
  • Along the lines of the previous point, it should be considered the possibility of producing the same product in more than one point since, due to the reduction in relative importance of production costs caused by the concentration in 'core business', an industrial split, although It increases the purely industrial cost, it can reduce the cost of distribution by placing the origin of the product closer to consumption.

Parameters for the analysis acording previous premises

  • Is defined to allocate a Small size factory / production to cover physical distribution to a single city.
  • To reduce transport costs the optimal locations will have the minimal distance between supply and physical distribution
    • supply: Organic (dairy,soy, tapioca and fruits) producers.
    • Physical distribution: short distribution channel to retailers (indirect chanel). Target, high density population city (population over 1M), with high income and low poverty rate, and saturated of possible retailers (supermarkets and Gourmet Stores)
  • To avoid different Sales taxes in operations supply and Physical distribution will be managed in one single State (Sales taxes in the United States are taxes placed on the sale or lease of goods and services in the United States. Sales tax is governed at the state level and no national general sales tax exists.)

2.- Target Audience

Logistics department of a Handcrafted Organic ice cream manufacturer.

Report will highlight the location of areas suitable to minimize global costs in Supply + Physical Distribution to position production center.

3.- Data

Data 1 :

  • United States Department of Agriculture (USDA) - Organic integrity Database. This database is the most comprehensive source of information on organic production covering all farms and businesses that are certified to the USDA organic standard.
  • In order to find a specific certified organic farm or business that produces required supplies - milk, fruit, soy and tapioca. Also provide type of fruit that can be supllied for seleccted States.

This dataset exists for free on the web. Link to the dataset is : https://organic.ams.usda.gov/integrity/. Once populated certified products filter with required supplies and country (United States os America) DDBB can be exported to Excel.

  • OID.OperationSearchResults.2020.11.11.1_11 PM.csv

Data 2 :

  • List of United States cities by population from wikipedia, as estimated by the United States Census Bureau. In this we will be using the data of the cities over 1M population.

https://en.wikipedia.org/wiki/List_of_United_States_cities_by_population

Data 3 :

  • Household income is an economic standard that can be applied to one household, or aggregated across a large group such as a county, city, or the whole country.
  • A key measure of household income is the median income, at which half of households have income above that level and half below.

United States Census Bureau - Median Household Income by State: 2019. Source: 2014-2018 American Community Survey 5-Year Estimates

https://www.census.gov/search-results.html?q=2019+Median+Household+Income+in+the+United+States&page=1&stateGeo=none&searchtype=web&cssp=SERP&_charset_=UTF-8

Link directs to a table that can be downloaded in csv format - Median Household Income by State.csv

Data 4 :

  • Poverty in the United States of America refers to people who lack sufficient income or material possessions for their needs. Although the United States is a relatively wealthy country by international standards, poverty has consistently been present throughout the United States.
  • The U.S. federal government uses two measures to measure poverty: the poverty thresholds set by the U.S. Census Bureau, used for statistical purposes, and the poverty guidelines issued by the Department of Health and Human Services, which are used for administrative purposes.

United States Census Bureau - Individuals Below Poverty Level by State : 2019.Source: 2020 Current Population Survey Annual Social and Economic Supplement (CPS ASEC).

https://www.census.gov/search-results.html?searchType=web&cssp=SERP&q=poverty

Link directs to a table that can be downloaded in csv format- Individuals Below Poverty Level by State.csv

Data 5 :

  • Cities geographical coordinates data will be utilized as input for the Foursquare API, that will be leveraged to provision retailers information for each of them.
  • The resulting searches are limited to 50 responses and the maximum radius of search is 100,000 meters. Foursquare API has advantages to tabulate all the venues near a particular position, but is cumbersome for acquiring clean data such as all super markets of one city. To overcome limitations we will use neighborhoods structure as an area slicer, also we will limit search results bu ussing categoryid of Supermarket(52f2ab2ebcbc57f1066b8b46) and Gourmet Shop (4bf58dd8d48988d1f5941735).

  • Neighborhoods datasets:

New York City dataset - https://cocl.us/new_york_dataset

Chicago data origin - https://en.wikipedia.org/wiki/List_of_neighborhoods_in_Chicago - Chicago Neighborhoods.csv

Philadelphia data origin - https://www.visitphilly.com/areas/philadelphia-neighborhoods/ - Philadelphia Neighborhoods.csv

LA data origin - https://enacademic.com/dic.nsf/enwiki/800534 - LA Neighborhoods.csv

San Diego data origin - https://en.wikipedia.org/wiki/List_of_communities_and_neighborhoods_of_San_Diego - San Diego Neighborhoods.csv

San Jose data origin - https://en.wikipedia.org/wiki/Category:Neighborhoods_in_San_Jose,_California - San Jose Neighborhoods.csv

Houston - https://en.wikipedia.org/wiki/List_of_Houston_neighborhoods

4.- Methodology section -

Main idea is to segment on one side Organic Production, at state level according to Ice Cream porduction needs, and on the other side cities and according results find suitable states to place production center.

In both cases we will use unsupervised learning algorithm K-means

Libraries imported.

4.1 - Organic production segmentation

Data Analisys and preparation

Even getting a filtered dataset from United States Department of Agriculture (USDA) - Organic integrity Database, the products we are looking for are diseminated in different cattegories: CROPS, HANDLING and LIVESTOCK certified colums.

First step is restructure data to clasify organic productions units according to whether or not they have the required products.

Certifier Name Certifier Website Certifier Email Address Operation ID Operation Name Other/Former Names Client ID Contact First Name Contact Last Name Operation Certification Status Effective Date of Operation Status NOP Anniversary Date CROPS Scope Certification Status Effective Date of CROPS Status Certified Products Under CROPS Scope Additional Certified Products Under CROPS Scope Certificate Numbers for Certified Products under CROPS Scope LIVESTOCK Scope Certification Status Effective Date of LIVESTOCK Status Certified Products Under LIVESTOCK Scope Additional Certified Products Under LIVESTOCK Scope Certificate Numbers for Certified Products under LIVESTOCK Scope WILD CROPS Scope Certification Status Effective Date of WILD CROPS Status Certified Products Under WILD CROPS Scope Additional Certified Products Under WILD CROPS Scope Certificate Numbers for Certified Products Under WILD CROPS Scope HANDLING Scope Certification Status Effective Date of HANDLING Status Certified Products Under HANDLING Scope Additional Certified Products Under HANDLING Scope Certificate Numbers for Certified Products Under HANDLING Scope Physical Address: Street 1 Physical Address: Street 2 Physical Address: City Physical Address: State/Province Physical Address: Country Physical Address: ZIP/ Postal Code County Code County Mailing Address: Street 1 Mailing Address: Street 2 Mailing Address: City Mailing Address: State/Province Mailing Address: Country Mailing Address: ZIP/ Postal Code County Code.1 County.1 Phone Email Website URL Additional Information Broker Community Supported Agriculture (CSA) Co-Packer Dairy Distributor Marketer/Trader Restaurant Retail Food Establishment Poultry Private Labeler Slaughterhouse Storage Grower Group Data as of Date Organic Certificate
0 Required Required Required Required Required Optional Optional Optional Optional Required Required Optional Optional Optional Required NaN Optional Optional Optional NaN NaN NaN Optional Optional NaN NaN NaN Optional Optional NaN NaN NaN Required Optional Required (US) Required (US) Required Required (US) Optional Optional Required Optional Required (US) Required (US) Required Required (US) Optional Optional Optional Optional Optional Optional Optional Optional Optional Optional Optional Optional Optional Optional Optional Optional Optional Optional Optional Required Optional
1 More information about Accredited Certifying A... From Certifier Profile From Certifier Profile NOP's 10-digit unique ID for operation. First... Operation's business name Other names that the operation is doing or has... Client ID issued by certifier. This can be any... NaN NaN Certified/ Surrendered/ Suspended/ Revoked Date the selected Operation Certification Stat... Date of annual update for certificate, at oper... Certified/ Surrendered/ Suspended MM/DD/YYYY Category: Item Name/Other Item (Item Variety) Category: Item Name/Other Item (Item Variety) Certificate numbers issued by Certifier. Certified/ Surrendered/ Suspended MM/DD/YYYY Category: Item Name/Other Item (Item Variety) Category: Item Name/Other Item (Item Variety) Certificate numbers issued by Certifier. Certified/ Surrendered/ Suspended MM/DD/YYYY Category: Item Name/Other Item (Item Variety) Category: Item Name/Other Item (Item Variety) Certificate numbers issued by Certifier. Certified/ Surrendered/ Suspended MM/DD/YYYY Category: Item Name/Other Item (Item Variety) Category: Item Name/Other Item (Item Variety) Certificate numbers issued by Certifier. At least one of the two addresses (Physical or... NaN NaN NaN NaN NaN NaN NaN At least one of the two addresses (Physical or... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Free text Yes as applicable Yes as applicable Yes as applicable Yes as applicable Yes as applicable Yes as applicable Yes as applicable Yes as applicable Yes as applicable Yes as applicable Yes as applicable Yes as applicable Yes as applicable Date of last update by certifier. MM/DD/YYYY Certificates are available for certified opera...
2 CCOF Certification Services, LLC www.ccof.org ccof@ccof.org 5561005508 119 Degrees West Farm NaN al596 Nicolas Anderson Certified 2018-08-19 00:00:00 NaN Certified 2018-08-19 00:00:00 Fruit - Pome: Apples, Apples NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 329 Road 11 SE NaN Othello Washington United States of America (the) 99344 025 Grant County 13821 Road A SE NaN Othello Washington United States of America (the) 99344 NaN NaN 509-750-5595 NaN NaN Learn more: www.ccof.org/members?title=119+deg... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 2020-08-04 00:00:00 0
3 Washington State Department of Agriculture www.agr.wa.gov/FoodAnimal/Organic organic@agr.wa.gov 2780003077 12 Birches Farm NaN 3077 Anna Petersons Certified 2016-04-28 00:00:00 NaN Certified 2016-04-28 00:00:00 Herbs/Spices: Herbs; Fruit - Pome: Apples, Pea... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 3340 Craw Road NaN Langley Washington United States of America (the) 98260 029 Island County NaN NaN www.12birchesfarm.com NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 2019-12-27 00:00:00 0
4 CCOF Certification Services, LLC www.ccof.org ccof@ccof.org 5561002642 123 Farm NaN ps194dv Sharon Ko Certified 2004-09-23 00:00:00 NaN Certified 2004-09-23 00:00:00 Nursery/Starts/Flowers/Trees: Flowers, Trees; ... NaN NaN Certified 2009-11-19 00:00:00 Sheep: Sheep (Last Third) (No Organic Slaughte... NaN NaN Certified 2011-01-05 00:00:00 Herbs: Herbs NaN NaN Certified 2004-09-23 00:00:00 Other: Condiment (Lavender Balsamic Vinaigrett... NaN NaN 10600 Highland Springs Avenue NaN Cherry Valley California United States of America (the) 92223 065 Riverside County 10600 Highland Springs Avenue NaN Cherry Valley California United States of America (the) 92223 NaN NaN 951-845-1151 NaN www.123farm.com Learn more: www.ccof.org/members?title=123+far... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 2020-10-05 00:00:00 0

Reduce dataset cols to the ones needed for the analysis.

Operation Name Certified Products Under CROPS Scope Certified Products Under HANDLING Scope Certified Products Under LIVESTOCK Scope Physical Address: Street 1 Physical Address: State/Province
0 Required Required NaN NaN Required Required (US)
1 Operation's business name Category: Item Name/Other Item (Item Variety) Category: Item Name/Other Item (Item Variety) Category: Item Name/Other Item (Item Variety) At least one of the two addresses (Physical or... NaN
2 119 Degrees West Farm Fruit - Pome: Apples, Apples NaN NaN 329 Road 11 SE Washington
3 12 Birches Farm Herbs/Spices: Herbs; Fruit - Pome: Apples, Pea... NaN NaN NaN NaN
4 123 Farm Nursery/Starts/Flowers/Trees: Flowers, Trees; ... Other: Condiment (Lavender Balsamic Vinaigrett... Sheep: Sheep (Last Third) (No Organic Slaughte... 10600 Highland Springs Avenue California

Group certified product into a single col to make analysis easier and rename cols.

Operation Name Address State Certified Products
0 119 Degrees West Farm 329 Road 11 SE Washington Fruit - Pome: Apples, Apples--
1 123 Farm 10600 Highland Springs Avenue California Nursery/Starts/Flowers/Trees: Flowers, Trees; ...
2 1700 Stiles Rd Farm 1700 Stiles Road New York Field/Forageable: Corn, Hay, Pasture, Tritical...
3 1St Maine Farm, Hemp & Craft 58 Williams Pond Rd. Maine Tuber/Root Vegetables: Garlic, Onions, Potatoe...
4 2 Creeks Farm 19986 Maland Dr. Minnesota Field/Forageable: Corn, Pasture-Goats: Milking...

For this analysis we will keep only Organic Operations that can provide the suppy needed - Milk, Fruit, Soy or Tapioca.

Operation Name Address State Milk Fruit Soy Tapioca
0 119 Degrees West Farm 329 Road 11 SE Washington 0 1 0 0
1 123 Farm 10600 Highland Springs Avenue California 0 1 0 0
2 1700 Stiles Rd Farm 1700 Stiles Road New York 1 0 0 0
3 1St Maine Farm, Hemp & Craft 58 Williams Pond Rd. Maine 0 1 0 0
4 2 Creeks Farm 19986 Maland Dr. Minnesota 1 0 0 0

Number of Organic Operations in scope

(9232, 7)

Second step is to group Organic Operations by State

Count operations per State

Operation Name Address Milk Fruit Soy Tapioca
State
California 2013 2013 2013 2013 2013 2013
New York 1168 1168 1168 1168 1168 1168
Wisconsin 843 843 843 843 843 843
Pennsylvania 771 771 771 771 771 771
Iowa 544 544 544 544 544 544

Count operation by category and State

Milk Fruit Soy Tapioca
State
Alabama 2 3 0 0
Arizona 1 65 2 0
Arkansas 1 6 14 0
California 163 1829 92 10
Colorado 20 76 8 2

Once data set is ready we will include geographic coordinates(latitude and longitude) to prepare some visualizations

Operation Name Address State Milk Fruit Soy Tapioca Full address
0 119 Degrees West Farm 329 Road 11 SE Washington 0 1 0 0 329 Road 11 SE ,Washington, United States of A...
1 123 Farm 10600 Highland Springs Avenue California 0 1 0 0 10600 Highland Springs Avenue ,California, Uni...
2 1700 Stiles Rd Farm 1700 Stiles Road New York 1 0 0 0 1700 Stiles Road ,New York, United States of A...
3 1St Maine Farm, Hemp & Craft 58 Williams Pond Rd. Maine 0 1 0 0 58 Williams Pond Rd. ,Maine, United States of ...
4 2 Creeks Farm 19986 Maland Dr. Minnesota 1 0 0 0 19986 Maland Dr. ,Minnesota, United States of ...

Geolocating process is not able to locate 1318 organic operations

Initial Operations:  9232 -  Missing coordinates operations:  1318 = Geolocated -  7914
Operation Name Address State Milk Fruit Soy Tapioca lat long
0 119 Degrees West Farm 329 Road 11 SE Washington 0 1 0 0 46.927731 -119.487584
1 123 Farm 10600 Highland Springs Avenue California 0 1 0 0 33.921831 -116.946735
2 1700 Stiles Rd Farm 1700 Stiles Road New York 1 0 0 0 42.664845 -77.020312
3 1St Maine Farm, Hemp & Craft 58 Williams Pond Rd. Maine 0 1 0 0 44.646679 -68.771483
4 2 Creeks Farm 19986 Maland Dr. Minnesota 1 0 0 0 43.854260 -91.880024

Next dataset shows top 5 afected States by null geolocation -

Operation Name Address Milk Fruit Soy Tapioca Full address gcode latlong
State
Wisconsin 384 384 384 384 384 384 384 384 0
New York 192 192 192 192 192 192 192 192 0
California 187 187 187 187 187 187 187 187 0
Pennsylvania 55 55 55 55 55 55 55 55 0
Minnesota 42 42 42 42 42 42 42 42 0

Visualizations of Organic Production (Milk|Fruit|Soy|Tapioca) by state

Caveat - It has not been able to geolocate 1318 operations (9232 Total). Most impacted state for graphs is Wisconsin loosing more than 30% of their Organic industry --> 843 - 384. Not so much impact for California, New york and Pennsylvania.

Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook

Top ten States in organic operation including product categories

Interactive map to visualize product categories in US Geography

Organic production segmentation with K-Means

Pre-processing, dropping columns with dsicrete variables. Features will be the number Organic Operations by products.

State Milk Fruit Soy Tapioca lat long
3 California 163 1829 92 10 36.701463 -118.755997
4 Colorado 20 76 8 2 38.725178 -105.607717
6 Delaware 1 4 6 1 38.692045 -75.401331
11 Idaho 23 12 4 1 43.644764 -114.015407
12 Illinois 25 58 161 7 40.079661 -89.433729

Normalization is a statistical method that helps mathematical-based algorithms to interpret features with different magnitudes and distributions equally. We use StandardScaler() to normalize our dataset.

<matplotlib.axes._subplots.AxesSubplot at 0x1fffde814c0>

Not normalized values

<matplotlib.axes._subplots.AxesSubplot at 0x1fffddb34f0>

Normalized values

<matplotlib.axes._subplots.AxesSubplot at 0x1fffe227070>

For the K-means algorithm, we need to input a K value. The KElbowVisualizer from Yellowbrick implements the “Elbow” method to select the optimal number of clusters by fitting the model with a range of values for K. The optimal K value occurs at the inflection on the curve and is shown with a dashed line.

<matplotlib.axes._subplots.AxesSubplot at 0x1fffe438490>
Cluster Labels State Milk Fruit Soy Tapioca lat long
3 2 California 163 1829 92 10 36.701463 -118.755997
4 1 Colorado 20 76 8 2 38.725178 -105.607717
6 1 Delaware 1 4 6 1 38.692045 -75.401331
11 1 Idaho 23 12 4 1 43.644764 -114.015407
12 1 Illinois 25 58 161 7 40.079661 -89.433729

Visualizing the 3 clusters

Milk Fruit Soy Tapioca lat long
Cluster Labels
0 378.0 158.600000 306.80000 2.200000 43.262514 -86.167969
1 31.0 46.190476 40.47619 1.714286 39.752373 -91.184484
2 163.0 1829.000000 92.00000 10.000000 36.701463 -118.755997
Make this Notebook Trusted to load map: File -> Trust Notebook
Cluster Labels State Milk Fruit Soy Tapioca lat long
14 0 Iowa 88 42 466 3 41.921673 -93.312270
22 0 Minnesota 128 53 281 1 45.989659 -94.611329
31 0 New York 765 319 251 1 43.000350 -75.499900
37 0 Pennsylvania 404 252 236 2 40.969989 -77.727883
49 0 Wisconsin 505 127 300 4 44.430898 -89.688464

Cluster 1 -

Cluster Labels State Milk Fruit Soy Tapioca lat long
4 1 Colorado 20 76 8 2 38.725178 -105.607717
6 1 Delaware 1 4 6 1 38.692045 -75.401331
11 1 Idaho 23 12 4 1 43.644764 -114.015407
12 1 Illinois 25 58 161 7 40.079661 -89.433729
13 1 Indiana 164 26 99 2 40.327013 -86.174693
15 1 Kansas 2 4 45 1 38.273120 -98.582187
16 1 Kentucky 45 23 38 1 37.572603 -85.155141
18 1 Maine 80 260 11 1 45.709097 -68.859020
19 1 Maryland 25 40 23 1 39.516223 -76.938207
20 1 Massachusetts 7 53 7 1 42.378877 -72.032366
26 1 Nebraska 2 3 105 1 41.737023 -99.587382
27 1 Nevada 8 19 3 1 39.515882 -116.853723
29 1 New Jersey 18 80 13 2 40.075738 -74.404162
30 1 New Mexico 4 17 1 1 34.570817 -105.993007
32 1 North Carolina 10 61 98 2 35.672964 -79.039292
33 1 North Dakota 3 8 14 2 47.620146 -100.540737
34 1 Ohio 177 26 161 1 40.225357 -82.688140
36 1 Oregon 14 84 12 4 43.979280 -120.737257
42 1 Tennessee 1 11 5 1 35.773008 -86.282008
43 1 Texas 18 70 34 2 31.816038 -99.512099
47 1 Washington 4 35 2 1 38.894992 -77.036558

Cluster 2 -

Cluster Labels State Milk Fruit Soy Tapioca lat long
3 2 California 163 1829 92 10 36.701463 -118.755997

According clustering:

  • Cluster 0 States present a balanced high production of the products needed.
  • Cluster 1 States present a reduced and/or unbalanced production.
  • Cluster 2 has one outstanding production of fruit and present the other products.

4.2 - Cities over 1M segmentation

Data Analisys and preparation

Preparation of United States cities by population dataset

12
2019rank City State[c] 2019estimate 2010Census Change 2016 land area 2016 land area.1 2016 population density 2016 population density.1 Location
0 1 New York City[d] New York 8336817 8175133 +1.98% 301.5 sq mi 780.9 km2 28,317/sq mi 10,933/km2 40°39′49″N 73°56′19″W / 40.6635°N 73.9387°W
1 2 Los Angeles California 3979576 3792621 +4.93% 468.7 sq mi 1,213.9 km2 8,484/sq mi 3,276/km2 34°01′10″N 118°24′39″W / 34.0194°N 118.4108°W
2 3 Chicago Illinois 2693976 2695598 −0.06% 227.3 sq mi 588.7 km2 11,900/sq mi 4,600/km2 41°50′15″N 87°40′54″W / 41.8376°N 87.6818°W
3 4 Houston[3] Texas 2320268 2100263 +10.48% 637.5 sq mi 1,651.1 km2 3,613/sq mi 1,395/km2 29°47′12″N 95°23′27″W / 29.7866°N 95.3909°W
4 5 Phoenix Arizona 1680992 1445632 +16.28% 517.6 sq mi 1,340.6 km2 3,120/sq mi 1,200/km2 33°34′20″N 112°05′24″W / 33.5722°N 112.0901°W

Geolocation is needed so we have to separate location in Latitude and longitude and standarize it.

2019rank City State[c] 2019estimate 2010Census Change 2016 land area 2016 land area.1 2016 population density 2016 population density.1 lat long
0 1 New York City[d] New York 8336817 8175133 +1.98% 301.5 sq mi 780.9 km2 28,317/sq mi 10,933/km2 40.6635 -73.9387
1 2 Los Angeles California 3979576 3792621 +4.93% 468.7 sq mi 1,213.9 km2 8,484/sq mi 3,276/km2 34.0194 -118.4108
2 3 Chicago Illinois 2693976 2695598 −0.06% 227.3 sq mi 588.7 km2 11,900/sq mi 4,600/km2 41.8376 -87.6818
3 4 Houston[3] Texas 2320268 2100263 +10.48% 637.5 sq mi 1,651.1 km2 3,613/sq mi 1,395/km2 29.7866 -95.3909
4 5 Phoenix Arizona 1680992 1445632 +16.28% 517.6 sq mi 1,340.6 km2 3,120/sq mi 1,200/km2 33.5722 -112.0901

Filter cities over 1M, drop columns not needed for the analysis and cell's content is converted to numbers.

City State Population 2019 Land area (sq mi) Population density/sq mi lat long
0 New York City[d] New York 8336817 301.5 28.317 40.6635 -73.9387
1 Los Angeles California 3979576 468.7 8.484 34.0194 -118.4108
2 Chicago Illinois 2693976 227.3 11.900 41.8376 -87.6818
3 Houston[3] Texas 2320268 637.5 3.613 29.7866 -95.3909
4 Phoenix Arizona 1680992 517.6 3.120 33.5722 -112.0901

Median household dataset preparation

State Income Margin Of Error
0 Alabama $48,486 +/- $364
1 Alaska $76,715 +/- $894
2 Arizona $56,213 +/- $275
3 Arkansas $45,726 +/- $350
4 California $71,228 +/- $217

Drop cols not needed, remove inconsistent records and convert income to number.

State Income
0 Alabama 48486
1 Alaska 76715
2 Arizona 56213
3 Arkansas 45726
4 California 71228

Merge Cities info with median_household imfo

City State Population 2019 Land area (sq mi) Population density/sq mi lat long Income
0 New York City[d] New York 8336817 301.5 28.317 40.6635 -73.9387 65323
1 Los Angeles California 3979576 468.7 8.484 34.0194 -118.4108 71228
2 San Diego California 1423851 325.2 4.325 32.8153 -117.1350 71228
3 San Jose California 1021795 177.5 5.777 37.2967 -121.8189 71228
4 Chicago Illinois 2693976 227.3 11.900 41.8376 -87.6818 63575

Poverty level dataset preparation

State Individuals Below Poverty Level Margin Of Error
0 Alabama 15.5% +/- 0.5%
1 Alaska 10.1% +/- 1.1%
2 Arizona 13.5% +/- 0.5%
3 Arkansas 16.2% +/- 0.6%
4 California 11.8% +/- 0.2%

Drop cols not needed, remove inconsistent records and convert Individuals Below Poverty Level to number.

State Individuals Below Poverty Level
0 Alabama 0.155
1 Alaska 0.101
2 Arizona 0.135
3 Arkansas 0.162
4 California 0.118

Is individuals below poverty rate related to median househols income per state? - Both features present high correlation but we will keep them for further analysis.

Correlation: -0.7561275204879928

Merge previous datasets with individuals below poverty rate data

City State Population 2019 Land area (sq mi) Population density/sq mi lat long Income Individuals Below Poverty Level
0 New York City[d] New York 8336817 301.5 28.317 40.6635 -73.9387 65323 0.130
1 Los Angeles California 3979576 468.7 8.484 34.0194 -118.4108 71228 0.118
2 San Diego California 1423851 325.2 4.325 32.8153 -117.1350 71228 0.118
3 San Jose California 1021795 177.5 5.777 37.2967 -121.8189 71228 0.118
4 Chicago Illinois 2693976 227.3 11.900 41.8376 -87.6818 63575 0.115

Cities over 1M population segmentation with K-Means

  • Pre-processing, dropping columns with discrete variables.
  • Poulation density and land area will be kept as features for clustering.
  • Income and lower poverty rate in this case will present a bias, as most of the cities belong to the same states (Texas and * California) so it wont be used in clustering.
  • The reason for not using Population is that is proportional to density.

We present the hypothesis that a higher population density will favor a more compressed distribution of services, more favorable for the logistics system (less distance between clients and higher number).

Land area (sq mi) Population density/sq mi
0 301.5 28.317
1 468.7 8.484
2 325.2 4.325
3 177.5 5.777
4 227.3 11.900

Normalization is a statistical method that helps mathematical-based algorithms to interpret features with different magnitudes and distributions equally. We use StandardScaler() to normalize our dataset.

<matplotlib.axes._subplots.AxesSubplot at 0x1ff808b5c40>

Not normalized values

<matplotlib.axes._subplots.AxesSubplot at 0x1ff80a99400>

Normalized values

<matplotlib.axes._subplots.AxesSubplot at 0x1ff80cc1d30>

For the K-means algorithm, we need to input a K value. The KElbowVisualizer from Yellowbrick implements the “Elbow” method to select the optimal number of clusters by fitting the model with a range of values for K. The optimal K value occurs at the inflection on the curve and is shown with a dashed line.

<matplotlib.axes._subplots.AxesSubplot at 0x1ff80a20fd0>
Cluster City City State Population 2019 Land area (sq mi) Population density/sq mi lat long Income Individuals Below Poverty Level
0 0 New York City[d] New York 8336817 301.5 28.317 40.6635 -73.9387 65323 0.130
1 1 Los Angeles California 3979576 468.7 8.484 34.0194 -118.4108 71228 0.118
2 1 San Diego California 1423851 325.2 4.325 32.8153 -117.1350 71228 0.118
3 1 San Jose California 1021795 177.5 5.777 37.2967 -121.8189 71228 0.118
4 2 Chicago Illinois 2693976 227.3 11.900 41.8376 -87.6818 63575 0.115

Visualizing the 4 clusters

Population 2019 Land area (sq mi) Population density/sq mi lat long Income Individuals Below Poverty Level
Cluster City
0 8.336817e+06 301.50 28.317000 40.663500 -73.938700 65323.00 0.13000
1 2.141741e+06 323.80 6.195333 34.710467 -119.121567 71228.00 0.11800
2 2.139020e+06 180.75 11.791500 40.923500 -81.407550 61510.00 0.11750
3 1.723022e+06 489.25 3.459250 31.406125 -100.693150 58730.75 0.13575
Make this Notebook Trusted to load map: File -> Trust Notebook

Cluster 0 - Cities

Cluster City City State Population 2019 Land area (sq mi) Population density/sq mi lat long Income Individuals Below Poverty Level
0 0 New York City[d] New York 8336817 301.5 28.317 40.6635 -73.9387 65323 0.13

Cluster 1 - Cities

Cluster City City State Population 2019 Land area (sq mi) Population density/sq mi lat long Income Individuals Below Poverty Level
1 1 Los Angeles California 3979576 468.7 8.484 34.0194 -118.4108 71228 0.118
2 1 San Diego California 1423851 325.2 4.325 32.8153 -117.1350 71228 0.118
3 1 San Jose California 1021795 177.5 5.777 37.2967 -121.8189 71228 0.118

Cluster 2 - Cities

Cluster City City State Population 2019 Land area (sq mi) Population density/sq mi lat long Income Individuals Below Poverty Level
4 2 Chicago Illinois 2693976 227.3 11.900 41.8376 -87.6818 63575 0.115
9 2 Philadelphia[e] Pennsylvania 1584064 134.2 11.683 40.0094 -75.1333 59445 0.120
Cluster City City State Population 2019 Land area (sq mi) Population density/sq mi lat long Income Individuals Below Poverty Level
5 3 Houston[3] Texas 2320268 637.5 3.613 29.7866 -95.3909 59570 0.136
6 3 San Antonio Texas 1547253 461.0 3.238 29.4724 -98.5251 59570 0.136
7 3 Dallas Texas 1343573 340.9 3.866 32.7933 -96.7665 59570 0.136
8 3 Phoenix Arizona 1680992 517.6 3.120 33.5722 -112.0901 56213 0.135

According clustering:

  • Cluster 0 city present the highest population density.
  • Cluster 1 cities present medium population density.
  • Cluster 2 cities present high population density.
  • Cluster 3 cities present low population density.

Initially clusters 1 and 2 seems to be more favorable for a reduced cost Physical distribution

Analisis of number of possible customers (supermarkets and gourmet shops) of clusters 1 and 2

Steps (same for each city):

  • Data analysis and preparation
  • Visualization of neighborhoods in scope
  • Use of Foursquare API to search for super markets and gourmet stores
  • Clean up of Foursquare data
  • Visualization of possible customers

NYC - Cluster 0

{'type': 'Feature',
 'id': 'nyu_2451_34572.1',
 'geometry': {'type': 'Point',
  'coordinates': [-73.84720052054902, 40.89470517661]},
 'geometry_name': 'geom',
 'properties': {'name': 'Wakefield',
  'stacked': 1,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'annoangle': 0.0,
  'borough': 'Bronx',
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661]}}

Data analysis and preparation

Borough Neighborhood Latitude Longitude
0 Bronx Wakefield 40.894705 -73.847201
1 Bronx Co-op City 40.874294 -73.829939
2 Bronx Eastchester 40.887556 -73.827806
3 Bronx Fieldston 40.895437 -73.905643
4 Bronx Riverdale 40.890834 -73.912585

Visualization of neighborhoods in scope

Make this Notebook Trusted to load map: File -> Trust Notebook

Use of Foursquare API to search for super markets and gourmet stores

(7233, 5)
Neighborhood Neighborhood Latitude Neighborhood Longitude Name Venue Category
0 Wakefield 40.894705 -73.847201 Gourmet Express Café
1 Wakefield 40.894705 -73.847201 Gourmet Deli Express Deli / Bodega
2 Wakefield 40.894705 -73.847201 GN Gourmet Deli Deli / Bodega
3 Wakefield 40.894705 -73.847201 Green Gourmet Deli Deli / Bodega
4 Wakefield 40.894705 -73.847201 T&T Gourmet Deli Deli / Bodega
(8426, 5)
Neighborhood Neighborhood Latitude Neighborhood Longitude Name Venue Category
0 Wakefield 40.894705 -73.847201 Fine Fair Supermarket Restaurant
1 Wakefield 40.894705 -73.847201 Associated Supermarket Supermarket
2 Wakefield 40.894705 -73.847201 Caribbean Supermarket Caribbean Restaurant
3 Wakefield 40.894705 -73.847201 S.Y. West Indian Supermarket Supermarket
4 Wakefield 40.894705 -73.847201 Laconia Supermarket Market
(15659, 5)

Clean up of Foursquare data

['Café',
 'Deli / Bodega',
 'African Restaurant',
 'Food',
 'American Restaurant',
 'Pizza Place',
 'Gourmet Shop',
 'Afghan Restaurant',
 'Convenience Store',
 'Restaurant',
 'Empty',
 'Grocery Store',
 'New American Restaurant',
 'Coffee Shop',
 'Food & Drink Shop',
 'Italian Restaurant',
 'Miscellaneous Shop',
 'Chinese Restaurant',
 'Supermarket',
 'Mexican Restaurant',
 'Office',
 'Breakfast Spot',
 'Sandwich Place',
 'Japanese Restaurant',
 'Salad Place',
 'Fruit & Vegetable Store',
 'BBQ Joint',
 'Cantonese Restaurant',
 'Butcher',
 'Bakery',
 'Ice Cream Shop',
 'Warehouse',
 'Market',
 'Food Truck',
 'Organic Grocery',
 'Empanada Restaurant',
 'Neighborhood',
 'Cupcake Shop',
 'Kosher Restaurant',
 'Bagel Shop',
 'Candy Store',
 'Food Service',
 'Fast Food Restaurant',
 'Caribbean Restaurant',
 'Dessert Shop',
 'Vegetarian / Vegan Restaurant',
 'Falafel Restaurant',
 'Pharmacy',
 'Bistro',
 'College Cafeteria',
 'Burger Joint',
 'Food Court',
 'Indian Restaurant',
 'Bookstore',
 'General Entertainment',
 'Asian Restaurant',
 'Eastern European Restaurant',
 'Gastropub',
 'Sushi Restaurant',
 'Park',
 'Flower Shop',
 'IT Services',
 'Diner',
 'Bar',
 'Wine Shop',
 'Factory',
 'Building',
 'Spanish Restaurant',
 'Taiwanese Restaurant',
 'Airport Food Court',
 'Snack Place',
 'Tea Room',
 'Cafeteria',
 'Halal Restaurant',
 'Hot Dog Joint',
 'Juice Bar',
 'Health Food Store',
 'Shop & Service',
 'Nail Salon',
 'Flea Market',
 'Discount Store',
 'Liquor Store',
 'Residential Building (Apartment / Condo)',
 'Fish Market',
 'Seafood Restaurant',
 'Automotive Shop',
 'Taco Place',
 'Financial or Legal Service',
 'Fish & Chips Shop',
 'Event Space',
 'Tech Startup',
 'Department Store',
 'Shopping Mall',
 'Sporting Goods Shop',
 'Newsstand',
 'Farmers Market',
 'Shipping Store']
['Café', 'African Restaurant', 'Food', 'American Restaurant', 'Pizza Place', 'Afghan Restaurant', 'Convenience Store', 'Restaurant', 'Empty', 'Grocery Store', 'New American Restaurant', 'Coffee Shop', 'Food & Drink Shop', 'Italian Restaurant', 'Miscellaneous Shop', 'Chinese Restaurant', 'Mexican Restaurant', 'Office', 'Breakfast Spot', 'Sandwich Place', 'Japanese Restaurant', 'Salad Place', 'Fruit & Vegetable Store', 'BBQ Joint', 'Cantonese Restaurant', 'Butcher', 'Bakery', 'Ice Cream Shop', 'Warehouse', 'Market', 'Food Truck', 'Organic Grocery', 'Empanada Restaurant', 'Neighborhood', 'Cupcake Shop', 'Kosher Restaurant', 'Bagel Shop', 'Candy Store', 'Food Service', 'Fast Food Restaurant', 'Caribbean Restaurant', 'Dessert Shop', 'Vegetarian / Vegan Restaurant', 'Falafel Restaurant', 'Pharmacy', 'Bistro', 'College Cafeteria', 'Burger Joint', 'Food Court', 'Indian Restaurant', 'Bookstore', 'General Entertainment', 'Asian Restaurant', 'Eastern European Restaurant', 'Gastropub', 'Sushi Restaurant', 'Park', 'Flower Shop', 'IT Services', 'Diner', 'Bar', 'Wine Shop', 'Factory', 'Building', 'Spanish Restaurant', 'Taiwanese Restaurant', 'Airport Food Court', 'Snack Place', 'Tea Room', 'Cafeteria', 'Halal Restaurant', 'Hot Dog Joint', 'Juice Bar', 'Health Food Store', 'Shop & Service', 'Nail Salon', 'Flea Market', 'Discount Store', 'Liquor Store', 'Residential Building (Apartment / Condo)', 'Fish Market', 'Seafood Restaurant', 'Automotive Shop', 'Taco Place', 'Financial or Legal Service', 'Fish & Chips Shop', 'Event Space', 'Tech Startup', 'Department Store', 'Shopping Mall', 'Sporting Goods Shop', 'Newsstand', 'Farmers Market', 'Shipping Store']
array(['Deli / Bodega', 'Gourmet Shop', 'Supermarket'], dtype=object)
7726

Visualization of possible customers

Neighborhood Num Targets Borough Latitude Longitude
0 Allerton 48 Bronx 40.865788 -73.859319
1 Arlington 1 Staten Island 40.635325 -74.165104
2 Arrochar 3 Staten Island 40.596313 -74.067124
3 Arverne 4 Queens 40.589144 -73.791992
4 Astoria 36 Queens 40.768509 -73.915654
Make this Notebook Trusted to load map: File -> Trust Notebook

Chicago - Cluster 1

Data analysis and preparation

Neighborhood Community area
0 Albany Park Albany Park
1 Altgeld Gardens Riverdale
2 Andersonville Edgewater
3 Archer Heights Archer Heights
4 Armour Square Armour Square
Neighborhood Community area Full address
0 Albany Park Albany Park Albany Park , Chicago, Illinois, United States...
1 Altgeld Gardens Riverdale Altgeld Gardens , Chicago, Illinois, United St...
2 Andersonville Edgewater Andersonville , Chicago, Illinois, United Stat...
3 Archer Heights Archer Heights Archer Heights , Chicago, Illinois, United Sta...
4 Armour Square Armour Square Armour Square , Chicago, Illinois, United Stat...
Neighborhood Community area lat long
0 Albany Park Albany Park 41.971937 -87.716174
1 Altgeld Gardens Riverdale 41.655259 -87.609584
2 Andersonville Edgewater 41.977139 -87.669273
3 Archer Heights Archer Heights 41.811422 -87.726165
4 Armour Square Armour Square 41.840033 -87.633107

Visualization of neighborhoods in scope

Make this Notebook Trusted to load map: File -> Trust Notebook

Use of Foursquare API to search for super markets and gourmet stores

(1599, 5)
Neighborhood Neighborhood Latitude Neighborhood Longitude Name Venue Category
0 Albany Park 41.971937 -87.716174 Cantepli Turkish Gourmet Turkish Restaurant
1 Albany Park 41.971937 -87.716174 Krispy Gourmet Popcorn Snack Place
2 Albany Park 41.971937 -87.716174 Mama Green's Gourmet Goodies Food Truck
3 Albany Park 41.971937 -87.716174 Mama Dang's Homemade Gourmet Vietnamese Restaurant
4 Albany Park 41.971937 -87.716174 Raw Gourmets International Vegetarian / Vegan Restaurant
(447, 5)
Neighborhood Neighborhood Latitude Neighborhood Longitude Name Venue Category
0 Albany Park 41.971937 -87.716174 Sahar International Supermarket Supermarket
1 Albany Park 41.971937 -87.716174 Charlie's Supermarket Supermarket
2 Andersonville 41.977139 -87.669273 Thuong Xa My A Supermarket Supermarket
3 Andersonville 41.977139 -87.669273 Park To Shop Supermarket Supermarket
4 Andersonville 41.977139 -87.669273 Win Sing Supermarket Supermarket
(2046, 5)

Clean up of Foursquare data

['Turkish Restaurant',
 'Snack Place',
 'Food Truck',
 'Vietnamese Restaurant',
 'Vegetarian / Vegan Restaurant',
 'Deli / Bodega',
 'Chinese Restaurant',
 'Wings Joint',
 'Food',
 'Taiwanese Restaurant',
 'Warehouse',
 'Asian Restaurant',
 'Mac & Cheese Joint',
 'Butcher',
 'Sandwich Place',
 'Event Space',
 'Miscellaneous Shop',
 'Fish Market',
 'Candy Store',
 'South American Restaurant',
 'Dessert Shop',
 'Ice Cream Shop',
 'Coffee Shop',
 'Office',
 'Italian Restaurant',
 'Grocery Store',
 'Bakery',
 'Indian Restaurant',
 'Café',
 'Event Service',
 'General Entertainment',
 'Restaurant',
 'Farmers Market',
 'Food & Drink Shop',
 'Sausage Shop',
 'American Restaurant',
 'French Restaurant',
 'Factory',
 'German Restaurant',
 'Empty',
 'Winery',
 'Other Great Outdoors',
 'Mexican Restaurant',
 'Middle Eastern Restaurant',
 'Building',
 'Convenience Store',
 'Gourmet Shop',
 'Supermarket',
 'Liquor Store',
 'Hardware Store']
['Turkish Restaurant', 'Snack Place', 'Food Truck', 'Vietnamese Restaurant', 'Vegetarian / Vegan Restaurant', 'Chinese Restaurant', 'Wings Joint', 'Food', 'Taiwanese Restaurant', 'Warehouse', 'Asian Restaurant', 'Mac & Cheese Joint', 'Butcher', 'Sandwich Place', 'Event Space', 'Miscellaneous Shop', 'Fish Market', 'Candy Store', 'South American Restaurant', 'Dessert Shop', 'Ice Cream Shop', 'Coffee Shop', 'Office', 'Italian Restaurant', 'Grocery Store', 'Bakery', 'Indian Restaurant', 'Café', 'Event Service', 'General Entertainment', 'Restaurant', 'Farmers Market', 'Food & Drink Shop', 'Sausage Shop', 'American Restaurant', 'French Restaurant', 'Factory', 'German Restaurant', 'Empty', 'Winery', 'Other Great Outdoors', 'Mexican Restaurant', 'Middle Eastern Restaurant', 'Building', 'Convenience Store', 'Gourmet Shop', 'Liquor Store', 'Hardware Store']
array(['Deli / Bodega', 'Supermarket'], dtype=object)
333

Visualization of possible customers

Neighborhood Num Targets Community area lat long
0 Albany Park 2 Albany Park 41.971937 -87.716174
1 Andersonville 8 Edgewater 41.977139 -87.669273
2 Armour Square 1 Armour Square 41.840033 -87.633107
3 Ashburn 1 Ashburn 41.747533 -87.711163
4 Avalon Park 2 Avalon Park 41.745035 -87.588658
Make this Notebook Trusted to load map: File -> Trust Notebook

Philadelphia - Cluster 1

Data analysis and preparation

Neighborhood
0 Avenue of the Arts
1 Callowhill
2 Chinatown
3 Elfreth's Alley
4 French Quarter
Neighborhood Full address
0 Avenue of the Arts Avenue of the Arts , Philadelphia, Pennsylvani...
1 Callowhill Callowhill , Philadelphia, Pennsylvania, Unite...
2 Chinatown Chinatown , Philadelphia, Pennsylvania, United...
3 Elfreth's Alley Elfreth's Alley , Philadelphia, Pennsylvania, ...
4 French Quarter French Quarter , Philadelphia, Pennsylvania, U...
Neighborhood lat long
1 Callowhill 39.967005 -75.236701
2 Chinatown 39.953446 -75.154622
3 Elfreth's Alley 39.952712 -75.141980
5 Logan Square 39.958125 -75.170560
6 Naval Square 39.943959 -75.184315

Visualization of neighborhoods in scope

Make this Notebook Trusted to load map: File -> Trust Notebook

Use of Foursquare API to search for super markets and gourmet stores

(2229, 5)
Neighborhood Neighborhood Latitude Neighborhood Longitude Name Venue Category
0 Callowhill 39.967005 -75.236701 Gourmet Candle Gourmet Shop
1 Callowhill 39.967005 -75.236701 International Gourmet Deli Deli / Bodega
2 Callowhill 39.967005 -75.236701 Jimmy John's Sandwich Place
3 Chinatown 39.953446 -75.154622 Shanghai Gourmet Chinese Restaurant
4 Chinatown 39.953446 -75.154622 Spruce Rana Gourmet Deli & Market Deli / Bodega
(1529, 5)
Neighborhood Neighborhood Latitude Neighborhood Longitude Name Venue Category
0 Callowhill 39.967005 -75.236701 Rodriguez Supermarket Grocery Store
1 Callowhill 39.967005 -75.236701 cesario supermarket Grocery Store
2 Callowhill 39.967005 -75.236701 Shop Rite Supermarket Exhibit
3 Callowhill 39.967005 -75.236701 62nd & Master Supermarket Deli / Bodega
4 Callowhill 39.967005 -75.236701 Lucky 7 Supermarket Convenience Store
(3758, 5)
['Gourmet Shop',
 'Deli / Bodega',
 'Sandwich Place',
 'Chinese Restaurant',
 'Grocery Store',
 'Coffee Shop',
 'American Restaurant',
 'Food & Drink Shop',
 'Food',
 'Café',
 'Belgian Restaurant',
 'Market',
 'Latin American Restaurant',
 'Cafeteria',
 'General Entertainment',
 'Pet Store',
 'Convenience Store',
 'Empty',
 'Food Truck',
 'Dessert Shop',
 'Speakeasy',
 'Soup Place',
 'Burger Joint',
 'Pizza Place',
 'Food Court',
 'Snack Place',
 'Seafood Restaurant',
 'Hotpot Restaurant',
 'Restaurant',
 'Southern / Soul Food Restaurant',
 'Bakery',
 'Exhibit',
 'Caribbean Restaurant',
 'Supermarket',
 'Asian Restaurant',
 'Miscellaneous Shop',
 'Spanish Restaurant',
 'Rental Car Location',
 'Candy Store',
 'Bank',
 'Vietnamese Restaurant']
['Gourmet Shop', 'Sandwich Place', 'Chinese Restaurant', 'Grocery Store', 'Coffee Shop', 'American Restaurant', 'Food & Drink Shop', 'Food', 'Café', 'Belgian Restaurant', 'Market', 'Latin American Restaurant', 'Cafeteria', 'General Entertainment', 'Pet Store', 'Convenience Store', 'Empty', 'Food Truck', 'Dessert Shop', 'Speakeasy', 'Soup Place', 'Burger Joint', 'Pizza Place', 'Food Court', 'Snack Place', 'Seafood Restaurant', 'Hotpot Restaurant', 'Restaurant', 'Southern / Soul Food Restaurant', 'Bakery', 'Exhibit', 'Caribbean Restaurant', 'Asian Restaurant', 'Miscellaneous Shop', 'Spanish Restaurant', 'Rental Car Location', 'Candy Store', 'Bank', 'Vietnamese Restaurant']
array(['Deli / Bodega', 'Supermarket'], dtype=object)
625

Visualization of possible customers

Neighborhood Num Targets lat long
0 Academy Gardens 1 40.063016 -75.009777
1 Allegheny West 1 40.008446 -75.177956
2 Andorra 1 40.072611 -75.231290
3 Angora 3 39.944002 -75.237682
4 Ashton-Woodenbridge 1 40.061879 -75.017135
Make this Notebook Trusted to load map: File -> Trust Notebook

Los Angeles - Cluster 2

Data analysis and preparation

Neighborhood
0 Arts District
1 Bunker Hill
2 Chinatown
3 Civic Center
4 Fashion District
Neighborhood Full address
0 Arts District Arts District , Los Angeles, California, Unite...
1 Bunker Hill Bunker Hill , Los Angeles, California, United ...
2 Chinatown Chinatown , Los Angeles, California, United St...
3 Civic Center Civic Center , Los Angeles, California, United...
4 Fashion District Fashion District , Los Angeles, California, Un...
Neighborhood lat long
0 Arts District 34.041239 -118.234450
1 Bunker Hill 34.055066 -118.251223
2 Chinatown 34.063840 -118.235868
3 Civic Center 34.053691 -118.242767
4 Fashion District 34.036622 -118.259069

Visualization of neighborhoods in scope

Make this Notebook Trusted to load map: File -> Trust Notebook

Use of Foursquare API to search for super markets and gourmet stores

(1663, 5)
Neighborhood Neighborhood Latitude Neighborhood Longitude Name Venue Category
0 Arts District 34.041239 -118.23445 Gourmet Wines and Spirits Liquor Store
1 Arts District 34.041239 -118.23445 Gourmet Coffee Café
2 Arts District 34.041239 -118.23445 Combalache's Gourmet Food Truck
3 Arts District 34.041239 -118.23445 Gourmet Specialties Food
4 Arts District 34.041239 -118.23445 Steven's Gourmet Deli Sandwich Place
(352, 5)
Neighborhood Neighborhood Latitude Neighborhood Longitude Name Venue Category
0 Arts District 34.041239 -118.234450 Far East Supermarket Grocery Store
1 Bunker Hill 34.055066 -118.251223 Far East Supermarket Grocery Store
2 Bunker Hill 34.055066 -118.251223 The Plaza Supermarket Empty
3 Chinatown 34.063840 -118.235868 Far East Supermarket Grocery Store
4 Civic Center 34.053691 -118.242767 Far East Supermarket Grocery Store
(2015, 5)

Clean up of Foursquare data

['Liquor Store',
 'Café',
 'Food Truck',
 'Food',
 'Sandwich Place',
 'Wine Shop',
 'Wine Bar',
 'Pizza Place',
 'Deli / Bodega',
 'Office',
 'Bakery',
 'Mexican Restaurant',
 'Moving Target',
 'Seafood Restaurant',
 'American Restaurant',
 'Coffee Shop',
 'Restaurant',
 'Chinese Restaurant',
 'Convenience Store',
 'Fast Food Restaurant',
 'Empty',
 'Building',
 'Taco Place',
 'Miscellaneous Shop',
 'Speakeasy',
 'Food Stand',
 'Argentinian Restaurant',
 'Salad Place',
 'Gourmet Shop',
 'Kosher Restaurant',
 'Thai Restaurant',
 'Armenian Restaurant',
 'Farmers Market',
 'Japanese Restaurant',
 'Burger Joint',
 'Southern / Soul Food Restaurant',
 'Food Service',
 'Asian Restaurant',
 'Dessert Shop',
 'Cheese Shop',
 'Market',
 'Donut Shop',
 'Soup Place',
 'General Entertainment',
 'Pharmacy',
 'Food & Drink Shop',
 'Snack Place',
 'Supermarket',
 'New American Restaurant',
 'French Restaurant',
 'Hot Dog Joint',
 'Grocery Store',
 'Financial or Legal Service',
 'Shop & Service',
 'Laundry Service']
['Liquor Store', 'Café', 'Food Truck', 'Food', 'Sandwich Place', 'Wine Shop', 'Wine Bar', 'Pizza Place', 'Office', 'Bakery', 'Mexican Restaurant', 'Moving Target', 'Seafood Restaurant', 'American Restaurant', 'Coffee Shop', 'Restaurant', 'Chinese Restaurant', 'Convenience Store', 'Fast Food Restaurant', 'Empty', 'Building', 'Taco Place', 'Miscellaneous Shop', 'Speakeasy', 'Food Stand', 'Argentinian Restaurant', 'Salad Place', 'Kosher Restaurant', 'Thai Restaurant', 'Armenian Restaurant', 'Farmers Market', 'Japanese Restaurant', 'Burger Joint', 'Southern / Soul Food Restaurant', 'Food Service', 'Asian Restaurant', 'Dessert Shop', 'Cheese Shop', 'Donut Shop', 'Soup Place', 'General Entertainment', 'Pharmacy', 'Food & Drink Shop', 'Snack Place', 'New American Restaurant', 'French Restaurant', 'Hot Dog Joint', 'Grocery Store', 'Financial or Legal Service', 'Shop & Service', 'Laundry Service']
array(['Deli / Bodega', 'Gourmet Shop', 'Market', 'Supermarket'],
      dtype=object)
173

Visualization of possible customers

Neighborhood Num Targets lat long
0 Angelino Heights 1 34.070289 -118.254796
1 Arleta 4 34.241327 -118.432205
2 Arts District 1 34.041239 -118.234450
3 Balboa Park 1 34.185903 -118.501010
4 Beachwood Canyon 2 34.122292 -118.321384
Make this Notebook Trusted to load map: File -> Trust Notebook

Clean up of Foursquare data

San Diego - Cluster 2

Data analysis and preparation

Neighborhood
0 Balboa Park
1 Bankers Hill
2 Barrio Logan
3 Bay Ho
4 Bay Park
Neighborhood Full address
0 Balboa Park Balboa Park , San Diego, California, United St...
1 Bankers Hill Bankers Hill , San Diego, California, United S...
2 Barrio Logan Barrio Logan , San Diego, California, United S...
3 Bay Ho Bay Ho , San Diego, California, United States ...
4 Bay Park Bay Park , San Diego, California, United State...
Neighborhood lat long
0 Balboa Park 32.731357 -117.146527
1 Bankers Hill 32.728293 -117.162105
2 Barrio Logan 32.693886 -117.138007
4 Bay Park 32.784638 -117.202605
5 Birdland 32.792333 -117.154206

Visualization of neighborhoods in scope

Make this Notebook Trusted to load map: File -> Trust Notebook

Use of Foursquare API to search for super markets and gourmet stores

(613, 5)
Neighborhood Neighborhood Latitude Neighborhood Longitude Name Venue Category
0 Balboa Park 32.731357 -117.146527 Westgate Gourmet Wine & Delicatessen Deli / Bodega
1 Balboa Park 32.731357 -117.146527 Gourmet India Indian Restaurant
2 Balboa Park 32.731357 -117.146527 Gourmet on 5th French Restaurant
3 Balboa Park 32.731357 -117.146527 Gourmet Tamales Flea Market
4 Balboa Park 32.731357 -117.146527 Scott's Gourmet Sandwiches Sandwich Place
(170, 5)
Neighborhood Neighborhood Latitude Neighborhood Longitude Name Venue Category
0 Balboa Park 32.731357 -117.146527 Ralphs Supermarket
1 Bankers Hill 32.728293 -117.162105 Ralphs Supermarket
2 Bankers Hill 32.728293 -117.162105 Starbucks Coffee Shop
3 Bay Park 32.784638 -117.202605 Keils Supermarket Grocery Store
4 Birdland 32.792333 -117.154206 Thuan Phat Supermarket Grocery Store
(783, 5)

Clean up of Foursquare data

['Deli / Bodega',
 'Indian Restaurant',
 'French Restaurant',
 'Flea Market',
 'Sandwich Place',
 'Juice Bar',
 'Cheese Shop',
 'Food Truck',
 'Ice Cream Shop',
 'Convenience Store',
 'Empty',
 'Food & Drink Shop',
 'Italian Restaurant',
 'Mexican Restaurant',
 'Bakery',
 'Cupcake Shop',
 'Taco Place',
 'Food',
 'Gourmet Shop',
 'Salad Place',
 'Cafeteria',
 'Thai Restaurant',
 'Restaurant',
 'Café',
 'Pizza Place',
 'Asian Restaurant',
 'Burger Joint',
 'Chinese Restaurant',
 'Distribution Center',
 'Coffee Shop',
 'Breakfast Spot',
 'Fast Food Restaurant',
 'Construction & Landscaping',
 'Food Service',
 'Seafood Restaurant',
 'Food Stand',
 'Supermarket',
 'Grocery Store',
 'Vietnamese Restaurant',
 'Market',
 'Financial or Legal Service']
['Indian Restaurant', 'French Restaurant', 'Flea Market', 'Sandwich Place', 'Juice Bar', 'Cheese Shop', 'Food Truck', 'Ice Cream Shop', 'Convenience Store', 'Empty', 'Food & Drink Shop', 'Italian Restaurant', 'Mexican Restaurant', 'Bakery', 'Cupcake Shop', 'Taco Place', 'Food', 'Salad Place', 'Cafeteria', 'Thai Restaurant', 'Restaurant', 'Café', 'Pizza Place', 'Asian Restaurant', 'Burger Joint', 'Chinese Restaurant', 'Distribution Center', 'Coffee Shop', 'Breakfast Spot', 'Fast Food Restaurant', 'Construction & Landscaping', 'Food Service', 'Seafood Restaurant', 'Food Stand', 'Grocery Store', 'Vietnamese Restaurant', 'Financial or Legal Service']
array(['Deli / Bodega', 'Gourmet Shop', 'Supermarket', 'Market'],
      dtype=object)
132

Visualization of possible customers

Neighborhood Num Targets lat long
0 Alta Vista 2 32.693339 -117.063154
1 Balboa Park 3 32.731357 -117.146527
2 Bankers Hill 3 32.728293 -117.162105
3 Barrio Logan 1 32.693886 -117.138007
4 Bay Terraces 2 32.691914 -117.036631
Make this Notebook Trusted to load map: File -> Trust Notebook

San Jose - Cluster 2

Data analysis and preparation

Neighborhood
0 The Alameda
1 Almaden Valley
2 Alum Rock
3 Alviso
4 Berryessa
Neighborhood Full address
0 The Alameda The Alameda , San Jose, California, United Sta...
1 Almaden Valley Almaden Valley , San Jose, California, United ...
2 Alum Rock Alum Rock , San Jose, California, United State...
3 Alviso Alviso , San Jose, California, United States o...
4 Berryessa Berryessa , San Jose, California, United State...
Neighborhood lat long
0 The Alameda 37.342670 -121.926298
1 Almaden Valley 37.221607 -121.861757
2 Alum Rock 37.366051 -121.827176
3 Alviso 37.426051 -121.975237
4 Berryessa 37.388280 -121.862349

Visualization of neighborhoods in scope

Make this Notebook Trusted to load map: File -> Trust Notebook

Use of Foursquare API to search for super markets and gourmet stores

(98, 5)
Neighborhood Neighborhood Latitude Neighborhood Longitude Name Venue Category
0 The Alameda 37.342670 -121.926298 Atlantic Aviation (SJC) Airport Service
1 Alviso 37.426051 -121.975237 Katies Gourmet New American Restaurant
2 Alviso 37.426051 -121.975237 Fresh Gourmet Express Fast Food Restaurant
3 Berryessa 37.388280 -121.862349 Gourmet Palace Food Truck Food Truck
4 Buena Vista 37.321338 -121.916626 La Villa Delicatessen & Gourmet Shop Deli / Bodega
(94, 5)
Neighborhood Neighborhood Latitude Neighborhood Longitude Name Venue Category
0 Almaden Valley 37.221607 -121.861757 Pw Supermarkets Bakery
1 Almaden Valley 37.221607 -121.861757 Lucky Supermarket
2 Almaden Valley 37.221607 -121.861757 Lucky Supermarket
3 Alum Rock 37.366051 -121.827176 Chavez Supermarket Grocery Store
4 Alum Rock 37.366051 -121.827176 Seafood City Supermarket
(192, 5)

Clean up of Foursquare data

['Airport Service',
 'New American Restaurant',
 'Fast Food Restaurant',
 'Food Truck',
 'Deli / Bodega',
 'Breakfast Spot',
 'Café',
 'Cupcake Shop',
 'Food Court',
 'Food Service',
 'Food',
 'Bakery',
 'Restaurant',
 'Empty',
 'Burger Joint',
 'Chinese Restaurant',
 'Mexican Restaurant',
 'Burrito Place',
 'Coffee Shop',
 'Grocery Store',
 'Supermarket',
 'Convenience Store',
 'Laundry Service']
['Airport Service', 'New American Restaurant', 'Fast Food Restaurant', 'Food Truck', 'Breakfast Spot', 'Café', 'Cupcake Shop', 'Food Court', 'Food Service', 'Food', 'Bakery', 'Restaurant', 'Empty', 'Burger Joint', 'Chinese Restaurant', 'Mexican Restaurant', 'Burrito Place', 'Coffee Shop', 'Grocery Store', 'Convenience Store', 'Laundry Service']
array(['Deli / Bodega', 'Supermarket'], dtype=object)
22

Visualization of possible customers

Neighborhood Num Targets lat long
0 Almaden Valley 2 37.221607 -121.861757
1 Alum Rock 2 37.366051 -121.827176
2 Berryessa 1 37.388280 -121.862349
3 Buena Vista 1 37.321338 -121.916626
4 Cambrian Park 1 37.256450 -121.931484
Make this Notebook Trusted to load map: File -> Trust Notebook

Houston - Cluster 3

Data analysis and preparation

Neighborhood
0 Willowbrook
1 Greater Greenspoint
2 Carverdale
3 Fairbanks
4 Northwest Crossing
Neighborhood Full address
0 Willowbrook Willowbrook , Houston, Texas, United States of...
1 Greater Greenspoint Greater Greenspoint , Houston, Texas, United S...
2 Carverdale Carverdale , Houston, Texas, United States of ...
3 Fairbanks Fairbanks , Houston, Texas, United States of A...
4 Northwest Crossing Northwest Crossing , Houston, Texas, United St...
Neighborhood lat long
0 Willowbrook 29.660254 -95.456096
1 Greater Greenspoint 29.944719 -95.416074
2 Carverdale 29.848687 -95.539450
3 Fairbanks 29.852726 -95.524386
4 Northwest Crossing 29.853820 -95.504597

Visualization of neighborhoods in scope

Make this Notebook Trusted to load map: File -> Trust Notebook

Use of Foursquare API to search for super markets and gourmet stores

(211, 5)
Neighborhood Neighborhood Latitude Neighborhood Longitude Name Venue Category
0 Willowbrook 29.660254 -95.456096 Perú Gourmet Restaurant Peruvian Restaurant
1 Willowbrook 29.660254 -95.456096 Wraps International Gourmet Food
2 Greater Greenspoint 29.944719 -95.416074 Touch of Gourmet Coffee Shop
3 Greater Greenspoint 29.944719 -95.416074 Rising Roll Gourmet Restaurant
4 Park Ten 29.761232 -95.378238 Gourmet Delights Deli Sandwich Place
(86, 5)
Neighborhood Neighborhood Latitude Neighborhood Longitude Name Venue Category
0 Carverdale 29.848687 -95.539450 Metal Supermarkets Houston Northwest Hardware Store
1 Fairbanks 29.852726 -95.524386 El Ahorro Supermarket # 2 Mexican Restaurant
2 Fairbanks 29.852726 -95.524386 Metal Supermarkets Houston Northwest Hardware Store
3 Northwest Crossing 29.853820 -95.504597 El Ahorro Supermarket # 2 Mexican Restaurant
4 Northwest Crossing 29.853820 -95.504597 Metal Supermarkets Houston Northwest Hardware Store
(297, 5)

Clean up of Foursquare data

['Peruvian Restaurant',
 'Food',
 'Coffee Shop',
 'Restaurant',
 'Sandwich Place',
 'Food Court',
 'Empty',
 'Dessert Shop',
 'Mediterranean Restaurant',
 'Indian Restaurant',
 'Bakery',
 'Sushi Restaurant',
 'Snack Place',
 'New American Restaurant',
 'Furniture / Home Store',
 'Deli / Bodega',
 'Gluten-free Restaurant',
 'Thai Restaurant',
 'Winery',
 'Distribution Center',
 'Asian Restaurant',
 'American Restaurant',
 'Burger Joint',
 'Breakfast Spot',
 'Market',
 'Salad Place',
 'Miscellaneous Shop',
 'Chinese Restaurant',
 'Italian Restaurant',
 'Hardware Store',
 'Mexican Restaurant',
 'Convenience Store',
 'Grocery Store',
 'Gas Station',
 'Building',
 'Pet Store',
 'Department Store',
 'Supermarket',
 'Middle Eastern Restaurant',
 'Butcher',
 'Fruit & Vegetable Store']
['Peruvian Restaurant', 'Food', 'Coffee Shop', 'Restaurant', 'Sandwich Place', 'Food Court', 'Empty', 'Dessert Shop', 'Mediterranean Restaurant', 'Indian Restaurant', 'Bakery', 'Sushi Restaurant', 'Snack Place', 'New American Restaurant', 'Furniture / Home Store', 'Gluten-free Restaurant', 'Thai Restaurant', 'Winery', 'Distribution Center', 'Asian Restaurant', 'American Restaurant', 'Burger Joint', 'Breakfast Spot', 'Salad Place', 'Miscellaneous Shop', 'Chinese Restaurant', 'Italian Restaurant', 'Hardware Store', 'Mexican Restaurant', 'Convenience Store', 'Grocery Store', 'Gas Station', 'Building', 'Pet Store', 'Middle Eastern Restaurant', 'Butcher', 'Fruit & Vegetable Store']
array(['Deli / Bodega', 'Market', 'Department Store', 'Supermarket'],
      dtype=object)
13

Visualization of possible customers

Neighborhood Num Targets lat long
0 Afton Oaks 1 29.731500 -95.453725
1 Edgebrook 1 29.640013 -95.253964
2 Gulfton 3 29.716173 -95.494948
3 IAH Airport 2 29.984142 -95.332986
4 Northside 1 29.784093 -95.357625
Make this Notebook Trusted to load map: File -> Trust Notebook

The results returned from Foursquare API for the 7 cities analyzed are:

City # Targets
0 New York City 7726
1 Los Angeles 173
2 Chicago 333
3 Philadelphia 625
4 San Diego 132
5 San Jose 22
6 Houston 13

Merge current results with other city information

City # Targets State Population 2019 Land area (sq mi) Population density/sq mi lat long
0 New York City[d] 7726 New York 8336817 301.5 28.317 40.6635 -73.9387
1 Los Angeles 173 California 3979576 468.7 8.484 34.0194 -118.4108
2 Chicago 333 Illinois 2693976 227.3 11.900 41.8376 -87.6818
3 Philadelphia[e] 625 Pennsylvania 1584064 134.2 11.683 40.0094 -75.1333
4 San Diego 132 California 1423851 325.2 4.325 32.8153 -117.1350
5 San Jose 22 California 1021795 177.5 5.777 37.2967 -121.8189
6 Houston[3] 13 Texas 2320268 637.5 3.613 29.7866 -95.3909

Next char and graphs are related to correlation between potential customers and population density

# Targets Population 2019 Land area (sq mi) Population density/sq mi
# Targets 1.000000 0.918909 -0.100833 0.942816
Population 2019 0.918909 1.000000 0.147473 0.890326
Land area (sq mi) -0.100833 0.147473 1.000000 -0.281304
Population density/sq mi 0.942816 0.890326 -0.281304 1.000000
Correlation: 0.9428160363713494

5.- Results and Discussion section -

We are going to present our analysis results according the proposed structure of segmenting Supply and Physical Distribution separatedly:

  • Supply areas: According to production supply needs - Milk, Fruit, Soy and Tapioca - we should focus our attention to the North-Eastern States of Iowa, Minnesota, New York, Pennsylvania and Wisconsin as they hold a high number of organic operations maintaining a balance in the types of porduct needed.
  • Physical Distribution: Folowwing requirements stated for Physical Distribution areas the cities with the biggest potential to develop distribution are New York City, Chicago and Philadelphia as their population density will present a highest number of potential customers and a reduced distance between them.

As it was required that operations supply and Physical distribution will be managed in one single State to avoid different Sales taxes and going througth previous clusterings the strongest options are:

  • 1.- NY - New York City
  • 2.- Pennsylvania - Philadelphia

Next in the list, we could consider as weaker alternatives, from a supply and/or physical distribution perspective :

  • Illinois - Chicago
  • California - Los Angeles

Recommended zones should therefore be considered only as a starting point for more detailed analysis which could eventually result in location which has also other factors taken into account and all other relevant conditions met.

Understanding Population Density and its impact

  • According to United States Census Bureau - Population Density In a broad sense, this tells us how many people would live within one square mile if the U.S. population were evenly distributed across its land area. In reality, however, we know that population is not evenly distributed across space. People tend to cluster in cities, and those who live in rural areas are spread out across a much more sparsely populated landscape.

  • While the United States population density is about 90 people per square mile, most people live in cities, which have a much higher density. Even among cities, density values can vary considerably from one city to another.

  • When comparing population density values for different geographic areas, then, it is helpful to keep in mind that the values are most useful for small areas, such as neighborhoods. For larger areas (especially at the state or country scale), overall population density values are less likely to provide a meaningful measure of the density levels at which people actually live, but can be useful for comparing settlement intensity across geographies of similar scale.[3]

We have minimally demonstrated that the number of services (in our case super markets ang gourmet shops) is totally dependant on population density, so focus on high density cities seems to be a nice strategy to start on Ice cream Venture. Accordingly to United States Census Bureau comments the approach for next step on the analysis for the best logistic definition could be reduce the grid and focus in boroughs, community areas and neightborhoods deeper in detail, focusing also in population density feature.

Reducing the grid also help to get most accurate results when using applications like Foursquare, letting a more precise exploration of areas allowing to obtain meaningful insights.

6.- Conclusion -

Purpose of this project was to identify the best States to optimize logistic costs of a Handcrafted Organic ice cream manufacturer to position a production center.

By calculating potential customers density distribution from Foursquare data we have first identified general boroughs that justify further analysis, and then generated extensive collection of locations which satisfy some basic requirements. Those zone centers were created to be used as starting points for final exploration by stakeholders.

Final decission on optimal location will be made by stakeholders based on specific characteristics, taking into consideration additional factors of each location - e.g. proximity to major roads, real estate availability, prices, social and economic dynamics etc.

7.- References: